55 research outputs found
Adversarial Machine Learning-Based Anticipation of Threats Against Vehicle-to-Microgrid Services
In this paper, we study the expanding attack surface of Adversarial Machine
Learning (AML) and the potential attacks against Vehicle-to-Microgrid (V2M)
services. We present an anticipatory study of a multi-stage gray-box attack
that can achieve a comparable result to a white-box attack. Adversaries aim to
deceive the targeted Machine Learning (ML) classifier at the network edge to
misclassify the incoming energy requests from microgrids. With an inference
attack, an adversary can collect real-time data from the communication between
smart microgrids and a 5G gNodeB to train a surrogate (i.e., shadow) model of
the targeted classifier at the edge. To anticipate the associated impact of an
adversary's capability to collect real-time data instances, we study five
different cases, each representing different amounts of real-time data
instances collected by an adversary. Out of six ML models trained on the
complete dataset, K-Nearest Neighbour (K-NN) is selected as the surrogate
model, and through simulations, we demonstrate that the multi-stage gray-box
attack is able to mislead the ML classifier and cause an Evasion Increase Rate
(EIR) up to 73.2% using 40% less data than what a white-box attack needs to
achieve a similar EIR.Comment: IEEE Global Communications Conference (Globecom), 2022, 6 pages, 2
Figures, 4 Table
A Comparative Study of AI-Based Intrusion Detection Techniques in Critical Infrastructures
Volunteer computing uses Internet-connected devices (laptops, PCs, smart devices, etc.), in which their owners volunteer them as storage and computing power resources, has become an essential mechanism for resource management in numerous applications. The growth of the volume and variety of data traffic on the Internet leads to concerns on the robustness of cyberphysical systems especially for critical infrastructures. Therefore, the implementation of an efficient Intrusion Detection System for gathering such sensory data has gained vital importance. In this article, we present a comparative study of Artificial Intelligence (AI)-driven intrusion detection systems for wirelessly connected sensors that track crucial applications. Specifically, we present an in-depth analysis of the use of machine learning, deep learning and reinforcement learning solutions to recognise intrusive behavior in the collected traffic. We evaluate the proposed mechanisms by using KDD\u2799 as real attack dataset in our simulations. Results present the performance metrics for three different IDSs, namely the Adaptively Supervised and Clustered Hybrid IDS (ASCH-IDS), Restricted Boltzmann Machine-based Clustered IDS (RBC-IDS), and Q-learning based IDS (Q-IDS), to detect malicious behaviors. We also present the performance of different reinforcement learning techniques such as State-Action-Reward-State-Action Learning (SARSA) and the Temporal Difference learning (TD). Through simulations, we show that Q-IDS performs with detection rate while SARSA-IDS and TD-IDS perform at the order of
A Novel Ensemble Method for Advanced Intrusion Detection in Wireless Sensor Networks
© 2020 IEEE. With the increase of cyber attack risks on critical infrastructures monitored by networked systems, robust Intrusion Detection Systems (IDSs) for protecting the information have become vital. Designing an IDS that performs with maximum accuracy with minimum false alarms is a challenging task. Ensemble method considered as one of the main developments in machine learning in the past decade, it finds an accurate classifier by combining many classifiers. In this paper, an ensemble classification procedure is proposed using Random Forest (RF), DensityBased Spatial Clustering of Applications with Noise (DBSCAN) and Restricted Boltzmann Machine (RBM) as base classifiers. RF, DBSCAN, and RBM techniques have been used for classification purposes. The ensemble model is introduced for achieving better results. Bayesian Combination Classification (BCC) has been adopted as a combination technique. Independent BCC (IBCC) and Dependent BCC (DBCC) have been tested for performance comparison. The model shows a promising result for all classes of attacks. DBCC performs over IBCC in terms of accuracy and detection rates. Through simulations under a wireless sensor network scenario, we have verified that DBCC-based IDS works with \approx 100\% detection and \approx 1.0 accuracy rate in the existence of intrusive behavior in the tested Wireless Sensor Network (WSN)
Anchor-Assisted and Vote-Based Trustworthiness Assurance in Smart City Crowdsensing
Smart city sensing calls for crowdsensing via mobile devices that are equipped with various built-in sensors. As incentivizing users to participate in distributed sensing is still an open research issue, the trustworthiness of crowdsensed data is expected to be a grand challenge if this cloud-inspired recruitment of sensing services is to be adopted. Recent research proposes reputation-based user recruitment models for crowdsensing; however, there is no standard way of identifying adversaries in smart city crowdsensing. This paper adopts previously proposed vote-based approaches, and presents a thorough performance study of vote-based trustworthiness with trusted entities that are basically a subset of the participating smartphone users. Those entities are called trustworthy anchors of the crowdsensing system. Thus, an anchor user is fully trustworthy and is fully capable of voting for the trustworthiness of other users, who participate in sensing of the same set of phenomena. Besides the anchors, the reputations of regular users are determined based on vote-based (distributed) reputation. We present a detailed performance study of the anchor-based trustworthiness assurance in smart city crowdsensing through simulations, and compare it with the purely vote-based trustworthiness approach without anchors, and a reputation-unaware crowdsensing approach, where user reputations are discarded. Through simulation findings, we aim at providing specifications regarding the impact of anchor and adversary populations on crowdsensing and user utilities under various environmental settings. We show that significant improvement can be achieved in terms of usefulness and trustworthiness of the crowdsensed data if the size of the anchor population is set properl
On Cropped versus Uncropped Training Sets in Tabular Structure Detection
Automated document processing for tabular information extraction is highly
desired in many organizations, from industry to government. Prior works have
addressed this problem under table detection and table structure detection
tasks. Proposed solutions leveraging deep learning approaches have been giving
promising results in these tasks. However, the impact of dataset structures on
table structure detection has not been investigated. In this study, we provide
a comparison of table structure detection performance with cropped and
uncropped datasets. The cropped set consists of only table images that are
cropped from documents assuming tables are detected perfectly. The uncropped
set consists of regular document images. Experiments show that deep learning
models can improve the detection performance by up to 9% in average precision
and average recall on the cropped versions. Furthermore, the impact of cropped
images is negligible under the Intersection over Union (IoU) values of 50%-70%
when compared to the uncropped versions. However, beyond 70% IoU thresholds,
cropped datasets provide significantly higher detection performance
Multidomain transformer-based deep learning for early detection of network intrusion
Timely response of Network Intrusion Detection Systems (NIDS) is constrained
by the flow generation process which requires accumulation of network packets.
This paper introduces Multivariate Time Series (MTS) early detection into NIDS
to identify malicious flows prior to their arrival at target systems. With this
in mind, we first propose a novel feature extractor, Time Series Network Flow
Meter (TS-NFM), that represents network flow as MTS with explainable features,
and a new benchmark dataset is created using TS-NFM and the meta-data of
CICIDS2017, called SCVIC-TS-2022. Additionally, a new deep learning-based early
detection model called Multi-Domain Transformer (MDT) is proposed, which
incorporates the frequency domain into Transformer. This work further proposes
a Multi-Domain Multi-Head Attention (MD-MHA) mechanism to improve the ability
of MDT to extract better features. Based on the experimental results, the
proposed methodology improves the earliness of the conventional NIDS (i.e.,
percentage of packets that are used for classification) by 5x10^4 times and
duration-based earliness (i.e., percentage of duration of the classified
packets of a flow) by a factor of 60, resulting in a 84.1% macro F1 score (31%
higher than Transformer) on SCVIC-TS-2022. Additionally, the proposed MDT
outperforms the state-of-the-art early detection methods by 5% and 6% on ECG
and Wafer datasets, respectively.Comment: 6 pages, 7 figures, 3 tables, IEEE Global Communications Conference
(Globecom) 202
Table Detection for Visually Rich Document Images
Table Detection (TD) is a fundamental task towards visually rich document
understanding. Current studies usually formulate the TD problem as an object
detection problem, then leverage Intersection over Union (IoU) based metrics to
evaluate the model performance and IoU-based loss functions to optimize the
model. TD applications usually require the prediction results to cover all the
table contents and avoid information loss. However, IoU and IoU-based loss
functions cannot directly reflect the degree of information loss for the
prediction results. Therefore, we propose to decouple IoU into a ground truth
coverage term and a prediction coverage term, in which the former can be used
to measure the information loss of the prediction results.
Besides, tables in the documents are usually large, sparsely distributed, and
have no overlaps because they are designed to summarize essential information
to make it easy to read and interpret for human readers. Therefore, in this
study, we use SparseR-CNN as the base model, and further improve the model by
using Gaussian Noise Augmented Image Size region proposals and many-to-one
label assignments.
To demonstrate the effectiveness of proposed method and compare with
state-of-the-art methods fairly, we conduct experiments and use IoU-based
evaluation metrics to evaluate the model performance. The experimental results
show that the proposed method can consistently outperform state-of-the-art
methods under different IoU-based metric on a variety of datasets. We conduct
further experiments to show the superiority of the proposed decoupled IoU for
the TD applications by replacing the IoU-based loss functions and evaluation
metrics with proposed decoupled IoU counterparts. The experimental results show
that our proposed decoupled IoU loss can encourage the model to alleviate
information loss
Handling big tabular data of ICT supply chains: a multi-task, machine-interpretable approach
Due to the characteristics of Information and Communications Technology (ICT)
products, the critical information of ICT devices is often summarized in big
tabular data shared across supply chains. Therefore, it is critical to
automatically interpret tabular structures with the surging amount of
electronic assets. To transform the tabular data in electronic documents into a
machine-interpretable format and provide layout and semantic information for
information extraction and interpretation, we define a Table Structure
Recognition (TSR) task and a Table Cell Type Classification (CTC) task. We use
a graph to represent complex table structures for the TSR task. Meanwhile,
table cells are categorized into three groups based on their functional roles
for the CTC task, namely Header, Attribute, and Data. Subsequently, we propose
a multi-task model to solve the defined two tasks simultaneously by using the
text modal and image modal features. Our experimental results show that our
proposed method can outperform state-of-the-art methods on ICDAR2013 and UNLV
datasets.Comment: 6 pages, 7 tables, 4 figures, IEEE Global Communications Conference
(Globecom), 202
Quantifying User Reputation Scores, Data Trustworthiness, and User Incentives in Mobile Crowd-Sensing
Ubiquity of mobile devices with rich sensory capabilities has given rise to the mobile crowd-sensing (MCS) concept, in which a central authority (the platform) and its participants (mobile users) work collaboratively to acquire sensory data over a wide geographic area. Recent research in MCS highlights the following facts: 1) a utility metric can be defined for both the platform and the users, quantifying the value received by either side; 2) incentivizing the users to participate is a non-trivial challenge; 3) correctness and truthfulness of the acquired data must be verified, because the users might provide incorrect or inaccurate data, whether due to malicious intent or malfunctioning devices; and 4) an intricate relationship exists among platform utility, user utility, user reputation, and data trustworthiness, suggesting a co-quantification of these inter-related metrics. In this paper, we study two existing approaches that quantify crowd-sensed data trustworthiness, based on statistical and vote-based user reputation scores. We introduce a new metric - collaborative reputation scores - to expand this definition. Our simulation results show that collaborative reputation scores can provide an effective alternative to the previously proposed metrics and are able to extend crowd sensing to applications that are driven by a centralized as well as decentralized control
Collaborative Feature Maps of Networks and Hosts for AI-driven Intrusion Detection
Intrusion Detection Systems (IDS) are critical security mechanisms that
protect against a wide variety of network threats and malicious behaviors on
networks or hosts. As both Network-based IDS (NIDS) or Host-based IDS (HIDS)
have been widely investigated, this paper aims to present a Combined Intrusion
Detection System (CIDS) that integrates network and host data in order to
improve IDS performance. Due to the scarcity of datasets that include both
network packet and host data, we present a novel CIDS dataset formation
framework that can handle log files from a variety of operating systems and
align log entities with network flows. A new CIDS dataset named SCVIC-CIDS-2021
is derived from the meta-data from the well-known benchmark dataset,
CIC-IDS-2018 by utilizing the proposed framework. Furthermore, a
transformer-based deep learning model named CIDS-Net is proposed that can take
network flow and host features as inputs and outperform baseline models that
rely on network flow features only. Experimental results to evaluate the
proposed CIDS-Net under the SCVIC-CIDS-2021 dataset support the hypothesis for
the benefits of combining host and flow features as the proposed CIDS-Net can
improve the macro F1 score of baseline solutions by 6.36% (up to 99.89%).Comment: IEEE Global Communications Conference (Globecom), 2022, 6 pages, 3
figures 4 table
- …